Data Preload for Superscalar
نویسنده
چکیده
decreased the average number of clock cycles per instruction. As a result, each execution cycle has become more signiicant t o o v erall system performance. To maximize the eeectiveness of each cycle, one must expose instruction-level parallelism and employ memory latency tolerant techniques. However, without special architecture support, a superscalar compiler cannot eeec-tively accomplish these two tasks in the presence of control and memory access dependences. Preloading is a class of architectural support which allows memory reads to be performed early in spite of potential violation of control and memory access dependences. With preload support, a superscalar compiler can perform more aggressive code reordering to provide increased tolerance of cache and memory access latencies and increasing instruction-level par-allelism. This thesis discusses the architectural features and compiler support required to eeectively utilize preload instructions to increase the overall system performance. The rst hardware support is preload register update, a data preload support for load scheduling to reduce rst-level cache hit latency. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more eeectively tolerate longer data access latencies. The second hardware support is memory connict buuer. Memory connict buuer extends preload register update support by allowing uses of the load to move a b o v e a m biguous stores. Correct program execution is maintained using the memory connict buuer and repair code iii provided by the compiler. With this addition, substantial speedup over an aggressive c o d e scheduling model is achieved for a set of control intensive nonnumerical programs. The last hardware support is preload buuer. Large data sets and slow memory subsystems result in unacceptable performance for numerical programs. Preload buuer allows performing loads early while eliminating problems with cache pollution and extended register live ranges. Adding the prestore buuer allows loads to be scheduled in the presence of ambiguous stores. Preload buuer support in addition to cache prefetching support is shown to achieve better performance than cache prefetching alone for a set of benchmarks. In all cases, preloading decreases the bus traac and reduces the miss rate when compared with no prefetching or cache prefetching. iv ACKNOWLEDGMENTS Discussions with Professor Wen-mei Hwu have always given me insight i n to the problems I am attempting to solve. He not only guided me through my research diiculties, …
منابع مشابه
Aggressive Schduling for Memory Accesses of CISC Superscalar Microprocessors
For CISC microprocessors, the proportion of memory access instructions is relatively high, and a specific address is likely to be accessed repeatedly in a short period of time because of register-to-memory or memory-to-memory instruction set architectures and limited register sets. As superscalar architectures advance, an aggressive scheduling policy for memory access becomes crucial. In this p...
متن کاملCached Data State Controller Enable
By exploiting ne grain parallelism, superscalar processors can potentially increase the performance of future super-computers. However, supercomputers typically have a long access delay to their rst level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement ...
متن کاملInternational Conference on Supercomputing . 1 Tolerating Data Access Latency with Register
By exploiting ne grain parallelism, superscalar processors can potentially increase the performance of future super-computers. However, supercomputers typically have a long access delay to their rst level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement ...
متن کاملImproving Database Performance on Simultaneous Multithreading Processors
Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction pipeline of a superscalar processor. Because threads share processor resources, an SMT system is inherently different from a multiprocessor system and, therefore, utilizing multiple threads on an SMT processor creates new challenges for database implementers. We investigate three thread-based tec...
متن کاملPreload Effect on Nonlinear Dynamic Behavior of Aerodynamic Two-Lobe Journal Bearings
This paper presents the effect of preload on nonlinear dynamic behavior of a rigid rotor supported by two-lobe aerodynamic noncircular journal bearing. A finite element method is employed to solve the Reynolds equation in static and dynamical states and the dynamical equations are solved using Runge-Kutta method. To analyze the behavior of the rotor center in the horizontal and vertical directi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993